Frame-Wise Cross-Modal Matching for Video Moment Retrieval

نویسندگان

چکیده

Video moment retrieval targets at retrieving a golden in video for given natural language query. The main challenges of this task include 1) the requirement accurately localizing (i.e., start time and end of) relevant an untrimmed stream, 2) bridging semantic gap between textual query contents. To tackle those problems, early approaches adopt sliding window or uniform sampling to collect clips first then match each clip with identify clips. Obviously, these strategies are time-consuming often lead unsatisfied accuracy localization due unpredictable length moment. avoid limitations, researchers recently attempt directly predict boundaries without generate first. One mainstream approach is multimodal feature vector target frames (e.g., concatenation) use regression upon boundary detection. Although some progress has been achieved by approach, we argue that methods have not well captured cross-modal interactions frames. In paper, propose Attentive Cross-modal Relevance Matching (ACRM) model which predicts temporal based on interaction modeling two modalities. addition, attention module introduced automatically assign higher weights words richer cues, considered be more important finding Another contribution additional predictor utilize internal training improve accuracy. Extensive experiments public datasets TACoS Charades-STA demonstrate superiority our method over several state-of-the-art methods. Ablation studies also conducted examine effectiveness different modules ACRM model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-modal Embeddings for Video and Audio Retrieval

The increasing amount of online videos brings several opportunities for training self-supervised neural networks. The creation of large scale datasets of videos such as the YouTube8M allows us to deal with this large amount of data in manageable way. In this work, we find new ways of exploiting this dataset by taking advantage of the multi-modal information it provides. By means of a neural net...

متن کامل

Cross-Modal Manifold Learning for Cross-modal Retrieval

This paper presents a new scalable algorithm for cross-modal similarity preserving retrieval in a learnt manifold space. Unlike existing approaches that compromise between preserving global and local geometries, the proposed technique respects both simultaneously during manifold alignment. The global topologies are maintained by recovering underlying mapping functions in the joint manifold spac...

متن کامل

MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval

Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relievin...

متن کامل

An Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment

Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...

متن کامل

OPTIMAL DESIGN OF COLUMNS FOR AN INTERMEDIATE MOMENT FRAME UNDER UNIAXIAL MOMENT AND AXIAL LOADS

The present study addresses optimal design of reinforced concrete (RC) columns based on equivalent equations considering deformability regulations of ACI318-14 under axial force and uniaxial bending moment. This study contrary to common approaches working with trial and error approach in design, at first presents an exact solution for intensity of longitudinal reinforcement in column section by...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2022

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2021.3063631